<a id="camel.datagen.self_improving_cot"></a>

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline"></a>

## SelfImprovingCoTPipeline

```python
class SelfImprovingCoTPipeline:
```

Pipeline for generating self-taught reasoning traces
using the self-improving methodology.

This implements the STaR paper's approach of:
1. Initial reasoning trace generation
2. Self-evaluation
3. Feedback-based improvement
4. Iterative refinement

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.__init__"></a>

### __init__

```python
def __init__(
    self,
    reason_agent: ChatAgent,
    problems: List[Dict],
    max_iterations: int = 3,
    score_threshold: Union[float, Dict[str, float]] = 0.7,
    rejection_sampling_n: Optional[int] = None,
    evaluate_agent: Optional[ChatAgent] = None,
    reward_model: Optional[BaseRewardModel] = None,
    output_path: Optional[str] = None,
    few_shot_examples: Optional[str] = None,
    batch_size: Optional[int] = None,
    max_workers: Optional[int] = None,
    solution_pattern: str = '\\\\boxed{(.*?)}',
    trace_pattern: Optional[str] = None
):
```

Initialize the self-improving cot pipeline.

**Parameters:**

- **reason_agent** (ChatAgent): The chat agent used for generating and improving reasoning traces.
- **problems** (List[Dict]): List of problem dictionaries to process.
- **max_iterations** (int, optional): Maximum number of improvement iterations. If set to `0`, the pipeline will generate an initial trace without any improvement iterations. (default: :obj:`3`)
- **score_threshold** (Union[float, Dict[str, float]], optional): Quality threshold. Can be either a single float value applied to average score, or a dictionary mapping score dimensions to their thresholds. For example: `{"correctness": 0.8, "coherence": 0.7}`. If using reward model and threshold for a dimension is not specified, will use the default value 0.7. (default: :obj:`0.7`)
- **rejection_sampling_n** (int, optional): Specifies the number of samples to be drawn using the rejection sampling method, where samples are accepted or rejected based on a predefined condition to achieve a desired distribution. (default: :obj: `None`)
- **evaluate_agent** (Optional[ChatAgent]): The chat agent used for evaluating reasoning traces. (default: :obj:`None`)
- **reward_model** (BaseRewardModel, optional): Model used to evaluate reasoning traces. If `None`, uses Agent self-evaluation. (default: :obj:`None`)
- **output_path** (str, optional): Output path for saving traces. If `None`, results will only be returned without saving to file. (default: :obj:`None`)
- **few_shot_examples** (str, optional): Examples to use for few-shot generation. (default: :obj:`None`)
- **batch_size** (int, optional): Batch size for parallel processing. (default: :obj:`None`)
- **max_workers** (int, optional): Maximum number of worker threads. (default: :obj:`None`)
- **solution_pattern** (str, optional): Regular expression pattern with one capture group to extract answers from solution text. (default: :obj:`r'\\boxed{(.*?)}'`)
- **trace_pattern** (str, optional): Regular expression pattern with one capture group to extract answers from trace text. If `None`, uses the same pattern as solution_pattern. (default: :obj:`None`)

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.safe_write_json"></a>

### safe_write_json

```python
def safe_write_json(self, file_path, data):
```

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.clean_json"></a>

### clean_json

```python
def clean_json(self, data):
```

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline._check_score_threshold"></a>

### _check_score_threshold

```python
def _check_score_threshold(self, scores: Dict[str, float]):
```

Check if scores meet the threshold requirements.

**Parameters:**

- **scores** (Dict[str, float]): Dictionary of scores for different dimensions.

**Returns:**

  bool: True if scores meet threshold requirements, False otherwise.

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline._generate_feedback"></a>

### _generate_feedback

```python
def _generate_feedback(self, scores: Dict[str, float]):
```

Generate feedback based on which dimensions need improvement.

**Parameters:**

- **scores** (Dict[str, float]): Dictionary of scores for different dimensions.

**Returns:**

  str: Feedback message indicating which dimensions need improvement.

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.generate_reasoning_trace"></a>

### generate_reasoning_trace

```python
def generate_reasoning_trace(self, problem: str):
```

Generate initial reasoning trace for a given problem.

**Parameters:**

- **problem** (str): The problem text to generate reasoning for.

**Returns:**

  str: Generated reasoning trace.

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.evaluate_trace"></a>

### evaluate_trace

```python
def evaluate_trace(
    self,
    problem: str,
    trace: str,
    solution: Optional[str] = None
):
```

Evaluate the quality of a reasoning trace.

**Parameters:**

- **problem** (str): The original problem text to evaluate against.
- **trace** (str): The reasoning trace to evaluate.
- **solution** (Optional[str]): The solution to the problem, if provided. (default: :obj:`None`)

**Returns:**

  Dict[str, Any]: Evaluation results containing:
- scores: Dict of evaluation dimensions and their scores
- feedback: Detailed feedback for improvement

For Agent self-evaluation, the scores will include:
- correctness: Score for logical correctness
- clarity: Score for clarity of explanation
- completeness: Score for completeness of reasoning

For reward model evaluation, the scores will depend on
the model's evaluation dimensions.

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.generate_reasoning_trace_rejection"></a>

### generate_reasoning_trace_rejection

```python
def generate_reasoning_trace_rejection(self, problem: str):
```

Generate multiple candidate reasoning traces for a problem and
select the best one based on evaluation.

**Parameters:**

- **problem** (str): The problem text for generating a reasoning trace.

**Returns:**

  str: The best candidate trace that meets quality criteria, or the
first candidate if none qualify.

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.improve_trace"></a>

### improve_trace

```python
def improve_trace(
    self,
    problem: str,
    trace: str,
    feedback: str,
    solution: Optional[str] = None
):
```

Generate improved reasoning trace based on feedback.

**Parameters:**

- **problem** (str): The original problem text.
- **trace** (str): The current reasoning trace.
- **feedback** (str): Feedback for improving the trace.
- **solution** (Optional[str]): The solution to the problem, if provided. (default: :obj:`None`)

**Returns:**

  str: Improved reasoning trace.

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.validate_problem_format"></a>

### validate_problem_format

```python
def validate_problem_format(self, problem: Dict):
```

Validate that a problem dictionary has the required format.

**Parameters:**

- **problem** (Dict): Problem dictionary to validate.

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline._check_boxed_answers"></a>

### _check_boxed_answers

```python
def _check_boxed_answers(self, solution: str, trace: str):
```

Check if the answer in the trace matches the solution using the
configured patterns.

**Parameters:**

- **solution** (str): The problem solution string.
- **trace** (str): The reasoning trace string.

**Returns:**

  bool: True if answers match, False otherwise

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.process_problem"></a>

### process_problem

```python
def process_problem(self, problem: Dict, rationalization: bool = False):
```

Process a single problem through the self-improving cot pipeline.

**Parameters:**

- **problem** (Dict): Problem dictionary containing the problem text.
- **rationalization** (bool, optional): Whether to use rationalization. (default: :obj:`False`)

**Returns:**

  ProblemResult: Results with final trace and history.

<a id="camel.datagen.self_improving_cot.SelfImprovingCoTPipeline.generate"></a>

### generate

```python
def generate(self, rationalization: bool = False):
```

Execute the self-improving cot pipeline on all problems.

Process problems and return results. If output_path is specified,
also save results to file.

**Parameters:**

- **rationalization** (bool, optional): Whether to use rationalization. (default: :obj:`False`)

**Returns:**

  List[Dict[str, Any]]: List of processed results
